Putting a Value on Comparable Data

نویسنده

  • Kevin Knight
چکیده

Machine translation began in 1947 with an influential memo by Warren Weaver. In that memo, Weaver noted that human code-breakers could transform ciphers into natural language (e.g., into Turkish)  without access to parallel ciphertext/plaintext data, and  without knowing the plaintext language’s syntax and semantics. Simple wordand letter-statistics seemed to be enough for the task. Weaver then predicted that such statistical methods could also solve a tougher problem, namely language translation. This raises the question: can sufficient translation knowledge be derived from comparable (non-parallel) data? In this talk, I will discuss initial work in treating foreign language as a code for English, where we assume the code to involve both word substitutions and word transpositions. In doing so, I will quantitatively estimate the value of non-parallel data, versus parallel data, in terms of end-to-end accuracy of trained translation systems. Because we still know very little about solving word-based codes, I will also describe successful techniques and lessons from the realm of letter-based ciphers, where the nonparallel resources are (1) enciphered text, and (2) unrelated plaintext. As an example, I will describe how we decoded the Copiale cipher with limited “computer-like” knowledge of the plaintext language. The talk will wrap up with challenges in exploiting comparable data at all levels: letters, words, phrases, syntax, and semantics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UK and Twenty Comparable Countries GDP-Expenditure-on-Health 1980-2013: The Historic and Continued Low Priority of UK Health-Related Expenditure

It is well-established that for a considerable period the United Kingdom has spent proportionally less of its gross domestic product (GDP) on health-related services than almost any other comparable country. Average European spending on health (as a % of GDP) in the period 1980 to 2013 has been 19% higher than the United Kingdom, indicating that comparable countries give far greater fiscal prio...

متن کامل

Identification of the Patient Requirements Using Lean Six Sigma and Data Mining

Lean health care is one of new managing approaches putting the patient at the core of each change. Lean construction is based on visualization for understanding and prioritizing imporvments. By using only visualization techniques, so much important information could be missed. In order to prioritize and select improvements, it’s essential to integrate new analysis tools to achieve a good unders...

متن کامل

The Effect of Enhanced Expectancies on Performance and learning Golf Putting with an Emphasis on Self-efficacy and Perception Competence

Sport psychology and its role is important effect on successful in physical education. One of following a field of sport psychology that affects performance, is self-efficacy. The aim of this study was to investigate the effect of raising expected impact on performance and learning with an emphasis on self-efficacy and competence Put Golf is perceived. In terms of content this study was applica...

متن کامل

تعیین و ارزیابی نسبت تبخیر و تعرق گیاه مرجع یونجه(ETr) به گیاه مرجع چمن (ETo) در دشت شهرکرد

It is necessary that ETr (Alfalfa-reference evapotranspiration) values be converted to ETo (Grass-reference evapotranspiration) or vice versa. The main objective of this study was to develop ETr to ETo ratios (Kr values) for a growing season in Shahrekord plain, Shahrekord, Iran. Mean monthly and total (growing season) values of Kr were calculated based on 185 daily ET data set in Chaharthakhte...

متن کامل

The Investigation Effective Value Factors in Elderly Situation Within Their Family

Objectives: The developing issue of oldness in industrious and the third world countries requires so much precision. Oldness, as one of the most important periods of life, constitutes part of the human's life and is of great significance particularly in the family life of everybody. The aim of this study focuses on some effective parameters in the quality of adults' situation in their family to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011